Minimum Error Classification Clustering

نویسندگان

  • Iwan Tri Riyadi Yanto
  • Ahmad Dahlan
چکیده

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. In this paper, we study on the problem of clustering categorical data, where data objects are made up of non-numerical attributes. We propose MECC (Minimum Error Classification Clustering), an alternative technique for categorical data clustering using VPRS taking into account minimum error classification. The technique is implemented in MATLAB. Experimental results on two benchmark UCI datasets show that MECC technique is better than the baseline categorical data clustering techniques with respect to selecting the clustering attribute.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Alternatives for k - Means Clustering

This work contains several theoretical and numerical studies on data clustering. The total squared error (TSE) between the data points and the nearest centroids is expressed as an analytic function, the gradient of that function is calculated, and the gradient descent method is used to minimize the TSE. In balance-constrained clustering, we optimize TSE, but so that the number of points in clus...

متن کامل

A New Simplified Gravitational Clustering Method for Multi-prototype Learning Based on Minimum Classification Error Training

In this paper, we propose a new simplified gravitational clustering method for multi-prototype learning based on minimum classification error (MCE) training. It simulates the process of the attraction and merging of objects due to their gravity force. The procedure is simplified by not considering velocity and multi-force attraction. The proposed hierarchical method does not depend on random in...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Comparison of classification and clustering methods in spatial rainfall pattern recognition at Northern Iran

Pattern recognition is the science of data structure and its classification. There are many classification and clustering methods prevalent in pattern recognition area. In this research, rainfall data in a region in Northern Iran are classified with natural breaks classification method and with a revised fuzzy c-means (FCM) algorithm as a clustering approach. To compare these two methods, the r...

متن کامل

Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL

MOTIVATION The microarray technology allows for the simultaneous monitoring of thousands of genes for each sample. The high-dimensional gene expression data can be used to study similarities of gene expression profiles across different samples to form a gene clustering. The clusters may be indicative of genetic pathways. Parallel to gene clustering is the important application of sample classif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013